23 research outputs found
Evaluation of ChatGPT Family of Models for Biomedical Reasoning and Classification
Recent advances in large language models (LLMs) have shown impressive ability
in biomedical question-answering, but have not been adequately investigated for
more specific biomedical applications. This study investigates the performance
of LLMs such as the ChatGPT family of models (GPT-3.5s, GPT-4) in biomedical
tasks beyond question-answering. Because no patient data can be passed to the
OpenAI API public interface, we evaluated model performance with over 10000
samples as proxies for two fundamental tasks in the clinical domain -
classification and reasoning. The first task is classifying whether statements
of clinical and policy recommendations in scientific literature constitute
health advice. The second task is causal relation detection from the biomedical
literature. We compared LLMs with simpler models, such as bag-of-words (BoW)
with logistic regression, and fine-tuned BioBERT models. Despite the excitement
around viral ChatGPT, we found that fine-tuning for two fundamental NLP tasks
remained the best strategy. The simple BoW model performed on par with the most
complex LLM prompting. Prompt engineering required significant investment.Comment: 28 pages, 2 tables and 4 figures. Submitting for revie
Enrichment of the NLST and NSCLC-Radiomics computed tomography collections with AI-derived annotations
Public imaging datasets are critical for the development and evaluation of
automated tools in cancer imaging. Unfortunately, many do not include
annotations or image-derived features, complicating their downstream analysis.
Artificial intelligence-based annotation tools have been shown to achieve
acceptable performance and thus can be used to automatically annotate large
datasets. As part of the effort to enrich public data available within NCI
Imaging Data Commons (IDC), here we introduce AI-generated annotations for two
collections of computed tomography images of the chest, NSCLC-Radiomics, and
the National Lung Screening Trial. Using publicly available AI algorithms we
derived volumetric annotations of thoracic organs at risk, their corresponding
radiomics features, and slice-level annotations of anatomical landmarks and
regions. The resulting annotations are publicly available within IDC, where the
DICOM format is used to harmonize the data and achieve FAIR principles. The
annotations are accompanied by cloud-enabled notebooks demonstrating their use.
This study reinforces the need for large, publicly accessible curated datasets
and demonstrates how AI can be used to aid in cancer imaging
Repeatability of Multiparametric Prostate MRI Radiomics Features
In this study we assessed the repeatability of the values of radiomics
features for small prostate tumors using test-retest Multiparametric Magnetic
Resonance Imaging (mpMRI) images. The premise of radiomics is that quantitative
image features can serve as biomarkers characterizing disease. For such
biomarkers to be useful, repeatability is a basic requirement, meaning its
value must remain stable between two scans, if the conditions remain stable. We
investigated repeatability of radiomics features under various preprocessing
and extraction configurations including various image normalization schemes,
different image pre-filtering, 2D vs 3D texture computation, and different bin
widths for image discretization. Image registration as means to re-identify
regions of interest across time points was evaluated against human-expert
segmented regions in both time points. Even though we found many radiomics
features and preprocessing combinations with a high repeatability (Intraclass
Correlation Coefficient (ICC) > 0.85), our results indicate that overall the
repeatability is highly sensitive to the processing parameters (under certain
configurations, it can be below 0.0). Image normalization, using a variety of
approaches considered, did not result in consistent improvements in
repeatability. There was also no consistent improvement of repeatability
through the use of pre-filtering options, or by using image registration
between timepoints to improve consistency of the region of interest
localization. Based on these results we urge caution when interpreting
radiomics features and advise paying close attention to the processing
configuration details of reported results. Furthermore, we advocate reporting
all processing details in radiomics studies and strongly recommend making the
implementation available
Natural language processing to automatically extract the presence and severity of esophagitis in notes of patients undergoing radiotherapy
Radiotherapy (RT) toxicities can impair survival and quality-of-life, yet
remain under-studied. Real-world evidence holds potential to improve our
understanding of toxicities, but toxicity information is often only in clinical
notes. We developed natural language processing (NLP) models to identify the
presence and severity of esophagitis from notes of patients treated with
thoracic RT. We fine-tuned statistical and pre-trained BERT-based models for
three esophagitis classification tasks: Task 1) presence of esophagitis, Task
2) severe esophagitis or not, and Task 3) no esophagitis vs. grade 1 vs. grade
2-3. Transferability was tested on 345 notes from patients with esophageal
cancer undergoing RT.
Fine-tuning PubmedBERT yielded the best performance. The best macro-F1 was
0.92, 0.82, and 0.74 for Task 1, 2, and 3, respectively. Selecting the most
informative note sections during fine-tuning improved macro-F1 by over 2% for
all tasks. Silver-labeled data improved the macro-F1 by over 3% across all
tasks. For the esophageal cancer notes, the best macro-F1 was 0.73, 0.74, and
0.65 for Task 1, 2, and 3, respectively, without additional fine-tuning.
To our knowledge, this is the first effort to automatically extract
esophagitis toxicity severity according to CTCAE guidelines from clinic notes.
The promising performance provides proof-of-concept for NLP-based automated
detailed toxicity monitoring in expanded domains.Comment: 17 pages, 6 tables, 1figure, submiting to JCO-CCI for revie
Large Language Models to Identify Social Determinants of Health in Electronic Health Records
Social determinants of health (SDoH) have an important impact on patient
outcomes but are incompletely collected from the electronic health records
(EHR). This study researched the ability of large language models to extract
SDoH from free text in EHRs, where they are most commonly documented, and
explored the role of synthetic clinical text for improving the extraction of
these scarcely documented, yet extremely valuable, clinical data. 800 patient
notes were annotated for SDoH categories, and several transformer-based models
were evaluated. The study also experimented with synthetic data generation and
assessed for algorithmic bias. Our best-performing models were fine-tuned
Flan-T5 XL (macro-F1 0.71) for any SDoH, and Flan-T5 XXL (macro-F1 0.70). The
benefit of augmenting fine-tuning with synthetic data varied across model
architecture and size, with smaller Flan-T5 models (base and large) showing the
greatest improvements in performance (delta F1 +0.12 to +0.23). Model
performance was similar on the in-hospital system dataset but worse on the
MIMIC-III dataset. Our best-performing fine-tuned models outperformed zero- and
few-shot performance of ChatGPT-family models for both tasks. These fine-tuned
models were less likely than ChatGPT to change their prediction when
race/ethnicity and gender descriptors were added to the text, suggesting less
algorithmic bias (p<0.05). At the patient-level, our models identified 93.8% of
patients with adverse SDoH, while ICD-10 codes captured 2.0%. Our method can
effectively extracted SDoH information from clinic notes, performing better
compare to GPT zero- and few-shot settings. These models could enhance
real-world evidence on SDoH and aid in identifying patients needing social
support.Comment: 38 pages, 5 figures, 5 tables in main, submitted for revie
MEDBERT.de: A Comprehensive German BERT Model for the Medical Domain
This paper presents medBERTde, a pre-trained German BERT model specifically
designed for the German medical domain. The model has been trained on a large
corpus of 4.7 Million German medical documents and has been shown to achieve
new state-of-the-art performance on eight different medical benchmarks covering
a wide range of disciplines and medical document types. In addition to
evaluating the overall performance of the model, this paper also conducts a
more in-depth analysis of its capabilities. We investigate the impact of data
deduplication on the model's performance, as well as the potential benefits of
using more efficient tokenization methods. Our results indicate that
domain-specific models such as medBERTde are particularly useful for longer
texts, and that deduplication of training data does not necessarily lead to
improved performance. Furthermore, we found that efficient tokenization plays
only a minor role in improving model performance, and attribute most of the
improved performance to the large amount of training data. To encourage further
research, the pre-trained model weights and new benchmarks based on
radiological data are made publicly available for use by the scientific
community.Comment: Keno K. Bressem and Jens-Michalis Papaioannou and Paul Grundmann
contributed equall
Imaging biomarker roadmap for cancer studies.
Imaging biomarkers (IBs) are integral to the routine management of patients with cancer. IBs used daily in oncology include clinical TNM stage, objective response and left ventricular ejection fraction. Other CT, MRI, PET and ultrasonography biomarkers are used extensively in cancer research and drug development. New IBs need to be established either as useful tools for testing research hypotheses in clinical trials and research studies, or as clinical decision-making tools for use in healthcare, by crossing 'translational gaps' through validation and qualification. Important differences exist between IBs and biospecimen-derived biomarkers and, therefore, the development of IBs requires a tailored 'roadmap'. Recognizing this need, Cancer Research UK (CRUK) and the European Organisation for Research and Treatment of Cancer (EORTC) assembled experts to review, debate and summarize the challenges of IB validation and qualification. This consensus group has produced 14 key recommendations for accelerating the clinical translation of IBs, which highlight the role of parallel (rather than sequential) tracks of technical (assay) validation, biological/clinical validation and assessment of cost-effectiveness; the need for IB standardization and accreditation systems; the need to continually revisit IB precision; an alternative framework for biological/clinical validation of IBs; and the essential requirements for multicentre studies to qualify IBs for clinical use.Development of this roadmap received support from Cancer Research UK and the Engineering and Physical Sciences Research Council (grant references A/15267, A/16463, A/16464, A/16465, A/16466 and A/18097), the EORTC Cancer Research Fund, and the Innovative Medicines Initiative Joint Undertaking (grant agreement number 115151), resources of which are composed of financial contribution from the European Union's Seventh Framework Programme (FP7/2007-2013) and European Federation of Pharmaceutical Industries and Associations (EFPIA) companies' in kind contribution
FUTURE-AI: International consensus guideline for trustworthy and deployable artificial intelligence in healthcare
Despite major advances in artificial intelligence (AI) for medicine and
healthcare, the deployment and adoption of AI technologies remain limited in
real-world clinical practice. In recent years, concerns have been raised about
the technical, clinical, ethical and legal risks associated with medical AI. To
increase real world adoption, it is essential that medical AI tools are trusted
and accepted by patients, clinicians, health organisations and authorities.
This work describes the FUTURE-AI guideline as the first international
consensus framework for guiding the development and deployment of trustworthy
AI tools in healthcare. The FUTURE-AI consortium was founded in 2021 and
currently comprises 118 inter-disciplinary experts from 51 countries
representing all continents, including AI scientists, clinicians, ethicists,
and social scientists. Over a two-year period, the consortium defined guiding
principles and best practices for trustworthy AI through an iterative process
comprising an in-depth literature review, a modified Delphi survey, and online
consensus meetings. The FUTURE-AI framework was established based on 6 guiding
principles for trustworthy AI in healthcare, i.e. Fairness, Universality,
Traceability, Usability, Robustness and Explainability. Through consensus, a
set of 28 best practices were defined, addressing technical, clinical, legal
and socio-ethical dimensions. The recommendations cover the entire lifecycle of
medical AI, from design, development and validation to regulation, deployment,
and monitoring. FUTURE-AI is a risk-informed, assumption-free guideline which
provides a structured approach for constructing medical AI tools that will be
trusted, deployed and adopted in real-world practice. Researchers are
encouraged to take the recommendations into account in proof-of-concept stages
to facilitate future translation towards clinical practice of medical AI
Phylogenetic ctDNA analysis depicts early-stage lung cancer evolution.
The early detection of relapse following primary surgery for non-small-cell lung cancer and the characterization of emerging subclones, which seed metastatic sites, might offer new therapeutic approaches for limiting tumour recurrence. The ability to track the evolutionary dynamics of early-stage lung cancer non-invasively in circulating tumour DNA (ctDNA) has not yet been demonstrated. Here we use a tumour-specific phylogenetic approach to profile the ctDNA of the first 100 TRACERx (Tracking Non-Small-Cell Lung Cancer Evolution Through Therapy (Rx)) study participants, including one patient who was also recruited to the PEACE (Posthumous Evaluation of Advanced Cancer Environment) post-mortem study. We identify independent predictors of ctDNA release and analyse the tumour-volume detection limit. Through blinded profiling of postoperative plasma, we observe evidence of adjuvant chemotherapy resistance and identify patients who are very likely to experience recurrence of their lung cancer. Finally, we show that phylogenetic ctDNA profiling tracks the subclonal nature of lung cancer relapse and metastasis, providing a new approach for ctDNA-driven therapeutic studies